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Abstract 

We provide a rate distortion interpretation of the problem of quantum data compression 
of ensembles of mixed states with commuting density operators. There are two versions of 
this problem. In the visible case the sequence of states is available to the encoder and in 
the blind or hidden case the encoder may access only a sequence of measurements. We find 
the exact optimal compression rates for both the visible and hidden cases. Our analysis 
includes the scenario in which asymptotic reconstruction is imperfect. 
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1 Introduction 



Claude Shannon created the foundations of information theory, a mathematical theory of com- 
munication, in his landmark 1948 paper However, until fairly recently few attempts were 
made to study the transmission and processing of quantum states. The excellent survey paper 
U provides considerable motivation for the study of quantum information theory. Important 
application areas include quantum cryptographic protocols that are more secure than and 
quantum computers that are dramatically faster than their classical counterparts. 

The first problem that Shannon addressed in p] was the ultimate data compression achiev- 
able on the output of a discrete information source. Shannon initially considered the set of 
encoding rules for which the source sequence can be perfectly retrieved from the encoded 
sequence, at least with high probability. For any discrete, stationary, and ergodic source, 
Shannon defined the entropy of the source as a function of the probabilities of the source and 
demonstrated that the minimum achievable average number of code symbols per source symbol 
is the entropy of the source. Later in another paper []|], Shannon also treated the problem of 
encoding a source given a fidelity criterion or a measure of the distortion for a representation of 
the source output. The goal in this case is to minimize the expected distortion attainable at a 
^ ■ particular rate. For a wide class of distortion measures and source models, Shannon provided 

a generalization of the source entropy, known as the rate distortion function, which establishes 
the exact trade off between the distortion level and the compression rate. 

An important problem in the field of quantum information theory is the generalization 
of classical results on data compression to the quantum domain. To our knowledge, the 
literature thus far treats quantum analogs of discrete, memoryless sources and assumes that 
the reconstruction must have arbitrarily high fidelity in the limit as the source string length 
approaches infinity. 

In order to describe a discrete, memoryless quantum source, we must first define pure and 
mixed quantum states. The state space of a quantum system is a complete description of the 
properties of the particles in the system. It includes information about positions, momentums, 
polarizations, spins, and so on. The state space is commonly modelled by a Hilbert space of 
wave functions. The mathematical tools used for the study of quantum information systems are 
finite dimensional complex vector spaces with an inner product that are spanned by abstract 
wave functions. A thorough discussion of mathematical conventions and terminology which 
are standard in quantum mechanics can be found in M. In particular, a state is a ray in a 
Hilbert space, where a ray is defined as an equivalence class of unit norm vectors that differ 
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by multiplication by a nonzero complex scalar. If we are looking at a subsystem of a larger 
quantum system, then the state of the subsystem is not necessarily a ray. If the state of the 
subsystem is a ray, then the state is called pure and otherwise it is called mixed. When we are 
considering these subsystems, the state of the system is represented by a density operator, i.e., 
a positive semi-definite matrix with unit trace. In the special case of a pure state, the density 
operator is the rank one outer product of the corresponding ray with its conjugate transpose. 
For a mixed state, the density operator is a convex combination of the density operators of 
two or more pure states. 

A discrete, memoryless quantum information source is an ensemble of density operators 
px, . . . , pm emitted with probabilities ax, ■ ■ ■ , a a/. Each density operator corresponds to a 
pure or a mixed state. The goal of the quantum data compression problem formulated in [|| 
is to compress a sequence of pure quantum states into the smallest possible Hilbert space with 
arbitrarily good reconstruction fidelity in the limit as the sequence length approaches infinity. 
In the special case where the ensemble consists of only pure states, the problem has been solved 
in || , H , . The more general problem where the ensemble contains at least one mixed state 
was first mentioned in (8|. In this case, the optimal compression rate is unknown []|], [10|, [11], 
but these papers provide upper and lower bounds on the best achievable compression rates. 

When the matrices corresponding to the density operators for an ensemble of mixed and/or 
pure states commute, the quantum compression problem has been reformulated in |ll[ as an 
equivalent classical information theory problem in which probability distributions are com- 
pressed and communicated. Our analysis will be in terms of this formulation. The problem of 
optimal mixed state coding has been considered in two different scenarios. In the first case, 
called the visible source case, the encoder knows the precise sequence of states or probability 
distributions that it is transmitting. In the second case, called the hidden source case, the 
encoder only has access to a measurement or "side information" sequence. Each entry of this 
second sequence is found by taking a measurement of the corresponding state; i.e., taking one 
experimental outcome of the probability distribution of the analogous entry in the original 
sequence. Elsewhere in the quantum information literature this is called the blind case, but 
the terminology "hidden" is more standard in the communications literature. References ||, 
[10], and [11] provide lower and upper bounds for the optimal rate of asymptotically faithful 
compression which apply to both variants of the problem. 

We provide a rate distortion interpretation of the problem which unifies the analysis of 
both variants and leads to the exact optimal rates for both the visible and blind versions. 
Furthermore, the rate distortion framework leads to a natural generalization of the quantum 
compression problem in which the expected fidelity of reconstruction is asymptotically bounded 
from below but is not necessarily perfect. To our knowledge, this problem has not been 
addressed earlier in the literature. Our techniques provide the optimal compression rate for 
the both the visible and blind commuting cases in this setting. 

It has come to our attention that [Q| presents an alternate proof of the achievability of the 
lower bound in the visible, commuting case where reconstruction is asymptotically perfect. 



1.1 Transmitting Probability Distributions 

Suppose that we have an ensemble of M states with the corresponding discrete probability 

mass functions Pi, Pi, . . . , Pm that assume outcome values from the alphabet y = {1, . . . , N}. 

We represent the alphabet {1, ... , M} by X. Let pij, i € X, j € y denote the probability 

th 

that a measurement of the i state leads to outcome value j. Hence, pij > 0, i E X, j £ y, 
and 12j=xPi,j = 1) i £ X. 
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Assume there is a memoryless source emitting a sequence of the mass functions. In other 
words, there is a probability distribution on X and with probability on the source emits state 
i. The source simultaneously produces a second sequence on y which can be viewed as side 
information. When the source emits state i, it also emits a side-information output symbol 
j G y with probability pi j. Let {Xe}e>i and {Zg}t>\ be the output of the source corresponding 
to the sequence of states and the sequence of side information, respectively. For the original 



problem posed in [11], we wish to consider codes in which a receiver that knows the source 
model generates a sequence {1^}£>i of output values that fall in the "strongly typical set" 
(see, e.g., [|D^, §13.6]) for the state sequence {Xt}i>\. More specifically, for each state i the 
relative frequencies of the iV output symbols corresponding to the positions where i is the 
state emitted from the source should asymptotically converge to the probability mass function 
Pi with probability 1. In other words, we measure the fidelity of the output sequence {Yg}^i 
through the empirical distribution of sequences of pairs {(JQ, Y()}g>i. In practice, coding is 
performed from finite strings X L = X\, X2, ■ ■ ■ , Xl to output strings Y L = Y\, Y2, . . . , Y^. 
Pick a block length L and let P^ LY t(hj) denote the sample frequency of state and output 
pairs (Xi,Yi) = over the range I G {1, . . . ,L}. Then for the compression problem with 



asymptotically perfect reconstruction we require the Bhattacharyya-Wootters overlap [11, p. 
9] of the true probabilities aj pij and the empirical frequencies of the state and output pairs 
to be arbitrarily close to 1 in the limit as L approaches infinity. More precisely, we choose our 
code to satisfy the constraint 




onPi,jP e xLYL {i,j) I <l-e] < 6 (1) 

for arbitrarily small positive constants 5 and e whenever L is sufficiently large. The code may 
use probabilistic processes for the encoding and/or decoding. The objective of the encoder is 
to compress the state sequence as much as possible. 

The source model for this problem superficially resembles the composite source models dis- 
cussed in [14, §6.1]. The key difference is the reversal of what is viewed as the side information 



sequence and what is viewed as the primary source sequence. For this reason, the analysis 
techniques developed for that source coding problem do not appear to apply to this setting. 

There are two obvious upper bounds to the minimum average number of bits per symbol 
required in the encoding. One of these bounds applies to both the visible and the blind 
versions of the compression problem and the other applies only to the visible case. For the 
visible problem, the encoder may simply transmit the sequence {X^^i and the decoder may 
use the appropriate probability mass function every time it receives a state to generate the 
output sequence. With this algorithm, the expected number of bits per symbol used by the 
encoder can come arbitrarily close to the entropy |jj of the state alphabet: 

M 

-^2<Xi log 2 Oi. 

1=1 

Another possibility for either the blind or the visible case is for the encoder to transmit the 
sequence {Z^}^>i and the decoder to use the sequence without modifying it. The entropy of 
this sequence is 



M N \ I M N 



Yl Yl aiPi j log 2 Y Y am 
~-ij=i / \i=ij=i 



It is easy to find situations where both of these procedures are suboptimal. Consider the case 
where the M probability mass functions are identical, on = 1/M for all states i, and pij = 1/N 
for all pairs of states i and output symbols j. In this case, transmitting the sequence {Xg}e>i 
will require log 2 M bits per symbol on average and transmitting the sequence {Zg}i>i will 
require log 2 N bits per symbol on average. Here the optimal coding procedure for both the 
visible and blind versions of the problem would be to have the encoder transmit nothing and 
the decoder generate independent and equiprobable output symbols. This coding procedure 
uses the ideal of zero bits per symbol. 

It is possible to modify the entropy upper bound for some sources to avoid the simple 
counterexample above. Suppose that there are two or more output symbols j which have 
a "common randomness;" i.e., for which the pij are equal for all i £ X. Then an encoding 
strategy would be to introduce an erasure symbol, to replace all occurrences of output symbols 
with common randomness in {Zi}i>i with the erasure symbol, and to encode the resulting 
sequence to its entropy. The decoder will not modify the ordinary symbols, and when it sees 
an erasure symbol it will generate a symbol of "common randomness" with the appropriate 
conditional probability. In the case where pij > for all pairs € X x y, we will show 
that for the blind version of the problem with asymptotically perfect fidelity it is impossible 
to do better than this modified entropy bound. Some additional care needs to be provided in 
specifying the solution for the blind version of the problem when there are pairs € X x y 
with pi j = 0, but the solution is in the form of a mutual information. 

jlOfl and Jll[ ] prove that a lower bound to the optimal compression ratio for both versions of 
the problem with asymptotically perfect fidelity is the mutual information between the state 
alphabet and the output alphabet 

M N / M N \ I M N \ 

Y Y aiPi j log 2 1** - [zZYl ai Pi* lo §2 Y Y ai Pi>i ' ( 2 ) 

i=l j=l \i=l j=l ) \i=l j=l ) 

but leaves open the question whether this lower bound is attainable in either the visible or the 
blind variants. We will establish that it is achievable for the visible version of the problem. 

Our analysis takes advantage of the tools of rate distortion theory. The quantum infor- 
mation literation thus far has focused upon the Bhattacharyya-Wootters overlap (see (jlj)) as 
a way to measure the closeness of two probability distributions. This overlap is non- negative 
and is equal to one exactly when the two probability distributions are identical. An equivalent 
and opposite way to measure the closeness of two probability distributions is to discuss their 
"distance" or the distortion generated by approximating one by the other. In this setting, 
perfect fidelity corresponds to zero distortion. The Bhattacharyya-Wootters overlap can be 
converted into such a distortion function by being subtracted from one. There are many other 
examples of interesting distortion functions that appear in the probability and classical infor- 
mation literature. The advantage of this interpretation is that rate distortion theory has been 
studied extensively since ||]. We will show that there is a very simple way to formulate and 
solve the problem of compressing probability distributions in the rate distortion setting. It is 
also straightforward to generalize these results to the case where the reconstruction fidelity is 
imperfect. 

2 Preliminaries 

We begin with several basic information-theoretic definitions. Suppose we have two discrete 
and finite random variables X and Y whose joint probability distribution is Pxy ■ The entropy 
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of X and conditional entropy of X given Y are denned as (see [13, Ch. 2]) 

H(X)= -Px(x)log(P x (x)), 
zesupp(P x ) 

H{X\Y)= -PxY(x,y)log(P xlY (x\y)), 
(x,j/)esupp(P XY ) 

where supp(Px) is the support of Px, i.e., the set of x such that Px(x) > 0. As done here, we 
will continue to write random variables with upper-case letters and values they take on with 
lower-case letters. The informational divergence between Px and Py is defined as 



D(Px\\Py)= £ Px(x)log(^||) 



z£supp(P x ) 

and we write D(Px\\Py) = °o when there is an x in supp(Px) such that Py(x) = 0. The 
informational divergence is also called the "information for discrimination," the "relative en- 
tropy" and the "Kullback-Leibler distance" [[U], p. 20], ]l3|, p. 18]. The mutual information 
between X and Y is defined as 

I(X-Y) = D(P xy \\PxPy) 
= H(X) - H{X\Y) 
= H(Y) -H(Y\X). 

A well-known property of these quantities is that they are all non-negative |13| , Ch. 2]. Fur- 
thermore, D(Px\\Py) = if and only if Px(x) = Py(x) for all x in supp(Px). This implies 
that I(X; Y) = if and only if X and Y are statistically independent. Two other important 
properties involving convexity are given as lemmas. 



Lemma 2.1 (Jl3|, p. 30]). D(Px\\Py) is convex in the pair (Px,Py), i-e., if {Px e ,PY e ), 
£ = 1, 2, . . . , L, are pairs of distributions, then for any nonnegative which sum to one we 
have 



^X e D (P Xt || P Yl ) > D [J2 \Px t 



i=i 



(3) 



Equivalently, we can view D(Px\\Py) as a function of Pxy and say that D(Px\\Py) is convex 
in Pxy- 

Let J be a random variable taking on the value £ with probability Xi, £ = 1, . . . , L. We can 
write (|3|) as 

Ej [D (P Xj || P Yj )\ > D (Ej [P Xj ] || Ej [P Yj ]) (4) 

where £\/[-] denotes expectation with respect to the random variable J. We will sometimes 
drop the subscript J and write E[ • ] if it is clear with respect to which random variable we are 
taking expectations. 

Lemma 2.2 ([13, p. 31]). The mutual information I(X;Y) is concave in Px when Py\x is 
fixed, and convex in Py\x when Px is fixed. In other words, we have 

Ej [I (Xj ; Yj)} < I (Ej [Xj] ; Ej [Yj]) 
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when Pyj\Xj is the same for all J, and 

Ej [I (Xj ;¥,;)}> I (Ej [Xj] ; Ej [Yj] ) (5) 

when Pxj is is the same for all J. 

Our distortion measures will be defined in terms of the empirical probability distribution 
of finite- length sequences or strings. The empirical probability distribution of the length- L 
string x L = xi,X2, ■ ■ ■ ,xl with X£ G X is defined as 

P^ a ) = r ^M. forallae*, 
Li 

where n x h(a) is the number of occurrences of the letter a in the string x L [[[(], p. 29], p3l p. 
279]. A simple yet important property of P^ L is given by the following lemma. 

Lemma 2.3. 

1 L 

^[Ph]=J j Y. P ^- ( 6 ) 



Proof. We have, for all a £ X, 



n X L{a) 



L 



E x l 



L 

L 



J2l(X e = a) 



where 1( • ) is the indicator function that is 1 if its argument is true and is otherwise. Since 
the expectation of a sum is the sum of the expectations [17, p. 10] , we have 



L 



-E 



J2l(X e = a) 



1 L 

T ^E XL [l(X e = a)) 



□ 



2.1 Rate Distortion Theory 

We describe the rate distortion problem as considered by Shannon H (see Fig. [j]). A discrete 
memoryless source (DMS) produces a message string X L of L independent and identically 
distributed letters from a finite alphabet X. X L is encoded into one of K = 2 LR received 
strings Y L of L letters from a finite alphabet y. The rate of the encoder is thus R bits per 
letter, because one can represent any y L by a string of LR bits. There is a distortion measure 
d(-, •) that associates a non-negative number d(x,y) with each pair (x,y) of message letter x 
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Figure 1: Model for the rate distortion problem. 
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Figure 2: Model for the rate distortion problem with a hidden source. 

and receive letter y. The distortion between the strings x L and y is defined as the average of 
the letter-to-letter distortions: 

1 L 

d(x L ,y L ) = j- ^2 d ( Xe > y*)i 

where we have abused notation by using the same symbol d for the letter-to-letter and string 
distortions. Shannon generalized the letter-to-letter distortion measure in ||, but we will not 
be using that generalization here. 

The fundamental problem of rate distortion theory is to determine the minimum code rate 
R such that the average distortion between X L and Y L is upper bounded by some number A. 
The rate distortion function R(A) is thus defined as the greatest lower bound on R such that 
E[d(X L ,Y L )] < A. Shannon showed that -R(A) has the simple form given by the following 
lemma. 

Lemma 2.4 (Shannon |3|). The rate distortion function of a DMS with distribution Px and 
letter-to-letter distortion measure d(-,-) is 



R(A) 



mm 

Py\x- 
E[d(X,Y)]<A 



I(X;Y). 



The achievability of the rate distortion function is usually demonstrated by choosing a random 
code as follows: the L letters of each of the 2 LR code words are chosen independently using 
Py. One then associates the "typical" strings x , i.e., those x for which P e L is close to Px, 

T X 

with one of the code words y for which P e L L is close to PxYi where P e L L is the empirical 

x y x y 

distribution of the L pairs (xg, yp). One can show that if R > -R(A) and L is large, such a code 
word y L exists and d(x L ,y L ) < A with high probability. 

A generalization of the rate distortion problem was given by Dobrushin and Tsybakov 
in (see Fig. ^). The new twist is that the encoder sees only a noisy version V L of the 
message X L , where V£ is generated by x? via the memoryless channel Pv\x( v t\ x t) f° r an 



The DMS is sometimes called a "remote source" Jjj, p. 781, p), p. 136]. We will call the DMS 
a hidden source, X L the hidden source string, V L the visible source string and Py the visible 
distribution. Note that if V = X we have the original rate distortion problem. 

The goal is again to determine the minimum code rate R such that the average distortion 
between X and Y L is upper bounded by some number A. The rate distortion function R(A) 
is thus defined as before, and Dobrushin and Tsybakov proved the following lemma. 



Lemma 2.5 (Dobrushin and Tsybakov ||15[| ). The rate distortion function of a hidden 
DMS with distribution Px, visible distribution Py, and single-letter distortion measure d(-,-) 
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is 



12(A) = min I(V;Y). 

Py\V'- 
E[d(X,Y)]<A 

Note that 

i(y ; y) = n(y)-n(y|y) 
= n(y) - n(y |vx) 
> F(y) - u(y|x) 
= i(^;^), 

where the second equality follows because Y is independent of X given V, and the inequality 
follows because conditioning cannot increase entropy |L3], p. 27]. Thus, not surprisingly, the 
best rate when X L is hidden is at least as large as when X is visible. 

Lemma 2.6. The random variables of the rate distortion problem with a hidden source satisfy 

L 

H{Y L ) > I{V L -Y L ) >Y,I(V t \Y t ) > L ■ I (V;Y) , (7) 

i=i 

where Pyy = Y^=i Pv e Y e /L. 

Proof. The first inequality follows by the non-neg ativity of H(Y L \V L ). In fact, Y L is usually 
a function of V L so that H(Y L \V L ) = and H(Y L ) = I(V L ;Y L ). The second inequality 
follows by 

L 

I(V L -Y L ) = Y,H{V e \V e - 1 ) - HiVelY^" 1 ) 

L 

= Y J H(V e )-H(V e \Y L V e - 1 ) 

L 

>Y,H(V e )-H(V e \Y e ). 

i=i 

The third inequality follows by viewing the sum over the I(Vf,Yi) as L times Ej[/(Vj; Yj)], 
where J takes on the value I with probability 1/L for £ = 1, . . . , L. The bound (||) then gives 
the desired result. □ 

3 Quantum Rate Distortion 

We deal with the visible and hidden (or blind) source problems simultaneously by introducing 
an auxiliary string Z to the model of Fig. ^ (see Fig. ||). Z L represents the outcomes of 
measurements and is called side information in Section |1.1| . The terms of Z L take on values 
in the alphabet Z and are generated together with V L as outputs of a memoryless channel 
Pvz\x- 

We are interested in string distortion measures d(-, •) that depend on (x L ,y L ) only through 
the empirical distribution P e L L . Thus, with some abuse of notation we can write d(x L ,y L ) = 
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Figure 3: Model for the quantum rate distortion problem. 



d(P^ LyL ). For example, using (Q) the Bhattacharyya-Wootters distortion measure could be 
defined as 



d{P^L yL , 



E 

x£X,z£Z 



Pxz(x,z)P^ LyL (x,z) 



(8) 



where Z plays the role of the measurement outcomes in Section 1.1 . The visible case has 
V = X while the hidden case can have V ^ X and has V = Z. As a second example, a 
natural information-theoretic distortion measure is the informational divergence 



d(P^ Ly L, 



D I P x L z\\ P x L y L 



0) 



where P^ Lz (a,b) is defined as Ylcey P x L y L ( a ' c ) p z\x(P\ a ) f° r all a £ X and b £ Z, i.e., 
P x L z = P x L P z\x- Observe that low distortion is achieved only if the empirical distribution of 
(x L ,y L ) is close to the desired distribution P^l z - 

We next impose an additional restriction on d(-), namely that d(Pxy) be convex in PxYi 

i.e., 

Ej[d(P XjYj )}>d(Ej[P XjYj }), (10) 
where J is a finite random variable. The distortion measure (0) meets this requirement by 



Lemma 2.1. The distortion measure (0) also meets this requirement since, for > and 



= 1, we have 



E 



E 



\ £ ai(x,z) 



< 1 



E 



E 

x.z 



y/X e a e (x,z) 



Em 1 



^2 yfat(x,z) 



where ai(x, z) = Pxz(x, z) P^ tYe ( x i z ) an d where the first step follows by Minkowski's inequal- 
ity @, p. 523]. 

We call the problem of finding the rate distortion function for this set-up the quantum 
commuting density operator (quantum CDO) rate distortion problem. The following lemma 
gives a lower bound on the rate distortion function. 

Lemma 3.1 (Rate Lower Bound). The rate R of the quantum CDO rate distortion prob- 
lem with expected distortion E [d (-f^Lyz,)] = A satisfies 



R> min I{V;Y). 

Py\V- 
d{P XY )<A 



(11) 
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Proof. A simple upper bound on H(Y L ) is the logarithm of the number of possible values Y L 
takes on with nonzero probability jl], Sec. 6], i.e., the logarithm of the number of code words. 
We thus have 



R > H(Y L )/L > I(V;Y) > min I(V;Y), 

P Y L\yL ■ 

E[d(X L ,Y L )]<A 



where the second inequality follows by (^), and the third inequality because of the minimiza- 
tion. Next, we have 

E[d(P e xLyL )] >d(E [P xL y,])=d(P X y), 

where PxY = ^2e=i Px e Y e /L. The inequality follows by the convexity of d(-) and the equality 
by Lemma |2.3| . Thus, the condition F>[d(X L , Y L )\ < A implies that d(P x y) < A, and we have 

R> min I(V;Y). 

P Y L\ V L- 

d(P x -)<A 

This is the same as ( |Tl|) because the minimization over P Y L\yL is the same as the minimization 

OVer Py\y- □ 



We next show that the lower bound of Lemma 3.1 can be approached arbitrarily closely, 
and is thus the desired rate distortion function. 

Lemma 3.2 (Achievable Rates). For any 5 > and distortion A there exists a block code 
of sufficiently large block length for which 



R < min I(V; Y) + S. 

Py\V' 
d(P XY )<A 



Proof. We give only a very brief sketch of the proof for this preliminary version of the paper. 
The code is generated by choosing some Py\v an d then randomly choosing each symbol of 
the 2 LR code words independently using the resulting Py. Let the fcth code word be y% and 
choose some e > 0. For each v L satisfying \P^ L (a) — Pv{a)\ < e for all a, one looks for a code 
word y^ such that \P e L i(a, b) — PvY{ a ,b)\ < e for all a and b. Let £k(v L ) be the event that 

the /cth code word Y^, now regarded as a random variable, is such a code word. Lemma 13.6.2 
in [13, p. 359] assures us that 



2 -L[I(V;Y) + e x ] < p r r^^Lj] < 2 ~L [/(V;Y)- 61 ] j 

where ei — > as e — > and L — > oo. Continuing as in Sec. 13.6], one will need 
K « 2 LI{ y^ code words to ensure that £k{v L ) occurs for at least one k for all the "typical" 
v L . One can also use the approach in |l^, Sec. 13.6] to show that the distortion criterion is 
met for each such (v L ,yu) pair with high probability. 

The code construction we have just described can be done for any Pyiy, so we choose that 
Py\y which minimizes the rate I(V;Y). □ 

Theorem 3.3. The rate distortion function of the quantum CDO rate distortion problem is 

R(A)= min I(V; Y). 

d(P XY )<A 
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Figure 4: Rate distortion function for a visible source. 

3.1 Examples 

We give examples to demonstrate how one can apply the above results. Consider Example 8 of 



| ill in which the states p\ = diag(«i, l — a\) and p2 = diag(a2> 1 — 02) have prior probabilities 
p and 1 — p, respectively, where diag(a, b) is a diagonal matrix with entries a and b. In (ll| it is 
shown that one may regard the two states as biased coins c\ and C2 that take on the values H 
(for heads) with respective probabilities ct\ and a 2 , and the value T (for tails) with respective 
probabilities 1 — ot\ and l — a 2 . Adapting this to Fig. H, we let X L be the sequence of coins and 
Z L a sequence of outcomes of coin tosses, i.e., P 'x(ci) = p, Pxipz) = 1 — p, Pz\x{H\c\) = a\, 
Pz\x(H\c2) = a 2 , and so on. 

Consider the visible case where V = X . so that the rate distortion function is 

12(A) = min I(X;Y). 
Py\x'- 
d(P XY )<& 

If A = then Pxy = Pxz for both the Bhattacharyya-Wootters and the informational 
divergence distortion measures. Thus, we have 

I{X; Y) = I(X; Z) = h(p ai + (1- p) a 2 ) - [p fc(a x ) + (1 - p) h(a 2 )} , 

where h(a) = — alog 2 (a) — (1 — a) log 2 (l — a) is the binary entropy function [|i~3| , Fig. 7]. For 
a concrete example, set p = 1/2, a\ = 1/10 and a 2 = 9/10. Then I(X;Y) ~ 0.5310 is the 
ultimate limit on data compression with no distortion; Fig. |^ shows 12(A) as a function of 
A for both the Bhattacharyya-Wootters (BW) and informational divergence (ID) distortion 
measures. Observe that 12(A) is convex p. 

Consider next the hidden source case (or blind case) where V = Z. We thus have 

12(A) = min I(Z;Y). 

Py\z'- 
d(P XY )<A 
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Figure 5: Rate distortion function for a hidden source. 



Again, if A = then Pxy = Pxz for both the Bhattacharyya-Wootters and the informational 
divergence distortion measures. Performing the optimization, we find that R(0) can be a 
discontinuous function of o^j f° r a 2 a i we have -R(O) = H(Z) = h{pa\ + (1 — p) o^) and 
for a.2 = a.\ we have R(0) = 0. For example, suppose that p = 1/2 and a.\ = 1/3. We plot 
i?(A) as a function of «2 fo r various A and the Bhattacharyya-Wootters distortion measure 
in Fig. ||. Observe that as A — > we will have a discontinuity at «2 = 1/3- In practice, this 
discontinuity does not occur because A = is impossible for finite block lengths. Furthermore, 
if A is not too small, say A = 10 -3 , then for many «2 one can achieve substantially better 
compression rates than R(0). 



4 Conclusions 

The problem of determining optimal compression limits for quantum information has recently 
generated considerable interest. In the special case of an ensemble of mixed states with com- 
muting density operators, we use rate distortion theory to find the optimal rates in both the 
visible and blind versions of the problem. We also generalize this special case of the quantum 
compression problem to the setting where the reconstruction is not faithful. 
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